AITopics | protein function

Collaborating Authors

protein function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Language models enable zero-shot prediction of the effects of mutations on protein function

Neural Information Processing SystemsDec-25-2025, 06:32:37 GMT

Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins. Since evolution encodes information about function into patterns in protein sequences, unsupervised models of variant effects can be learned from sequence data. The approach to date has been to fit a model to a family of related sequences. The conventional setting is limited, since a new model must be trained for each prediction task. We show that using only zero-shot inference, without any supervision from experimental data or additional training, protein language models capture the functional effects of sequence variation, performing at state-of-the-art.

language model enable zero-shot prediction, mutation, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)

Add feedback

Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms

Liu, Shiyi, Liang, Buwen, Fang, Yuetong, Jiang, Zixuan, Xu, Renjing

arXiv.org Artificial IntelligenceAug-5-2025

Recent advances in AI for science have highlighted the power of contrastive learning in bridging heterogeneous biological data modalities. Building on this paradigm, we propose HIPPO (HIerarchical Protein-Protein interaction prediction across Organisms), a hierarchical contrastive framework for protein-protein interaction(PPI) prediction, where protein sequences and their hierarchical attributes are aligned through multi-tiered biological representation matching. The proposed approach incorporates hierarchical contrastive loss functions that emulate the structured relationship among functional classes of proteins. The framework adaptively incorporates domain and family knowledge through a data-driven penalty mechanism, enforcing consistency between the learned embedding space and the intrinsic hierarchy of protein functions. Experiments on benchmark datasets demonstrate that HIPPO achieves state-of-the-art performance, outperforming existing methods and showing robustness in low-data regimes. Notably, the model demonstrates strong zero-shot transferability to other species without retraining, enabling reliable PPI prediction and functional inference even in less characterized or rare organisms where experimental data are limited. Further analysis reveals that hierarchical feature fusion is critical for capturing conserved interaction determinants, such as binding motifs and functional annotations. This work advances cross-species PPI prediction and provides a unified framework for interaction prediction in scenarios with sparse or imbalanced multi-species data.

machine learning, natural language, prediction, (17 more...)

arXiv.org Artificial Intelligence

2507.02724

Country: Asia > China > Hong Kong (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Prot2Chat: Protein LLM with Early Fusion of Sequence and Structure

Wang, Zhicong, Ma, Zicheng, Cao, Ziqiang, Zhou, Changlong, Zhang, Jun, Gao, Yiqin

arXiv.org Artificial IntelligenceFeb-7-2025

Proteins play a pivotal role in living organisms, yet understanding their functions presents significant challenges, including the limited flexibility of classification-based methods, the inability to effectively leverage spatial structural information, and the lack of systematic evaluation metrics for protein Q&A systems. To address these limitations, we propose Prot2Chat, a novel framework that integrates multimodal protein representations with natural language through a unified module, enabling large language model (LLM)-driven answer generation. Our model incorporates a modified ProteinMPNN encoder, which encodes protein sequence and structural information in a unified manner, a protein-text adapter with cross-attention mechanisms, and a LLaMA3 decoder. To optimize training efficiency, we freeze the encoder and employ LoRA techniques for the decoder. We conducted experiments on two datasets, both automated metrics and expert evaluations demonstrate the superior performance of our model. Furthermore, zero-shot prediction results highlight its strong generalization capabilities. This framework offers a promising solution for bridging protein domain knowledge with natural language understanding, paving the way for transformative advancements in protein-related research.

information, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.06846

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Language models enable zero-shot prediction of the effects of mutations on protein function

Neural Information Processing SystemsJan-19-2025, 14:05:52 GMT

language model enable zero-shot prediction, mutation, protein function, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs

Chatzianastasis, Michail, Dasoulas, George, Vazirgiannis, Michalis

arXiv.org Artificial IntelligenceJun-20-2024

Protein representation learning aims to learn informative protein embeddings capable of addressing crucial biological questions, such as protein function prediction. Although sequence-based transformer models have shown promising results by leveraging the vast amount of protein sequence data in a self-supervised way, there is still a gap in applying these methods to 3D protein structures. In this work, we propose a pre-training scheme going beyond trivial masking methods leveraging 3D and hierarchical structures of proteins. We propose a novel self-supervised method to pretrain 3D graph neural networks on 3D protein structures, by predicting the distances between local geometric centroids of protein subgraphs and the global geometric centroid of the protein. The motivation for this method is twofold. First, the relative spatial arrangements and geometric relationships among different regions of a protein are crucial for its function. Moreover, proteins are often organized in a hierarchical manner, where smaller substructures, such as secondary structure elements, assemble into larger domains. By considering subgraphs and their relationships to the global protein structure, the model can learn to reason about these hierarchical levels of organization. We experimentally show that our proposed pertaining strategy leads to significant improvements in the performance of 3D GNNs in various protein classification tasks.

protein, protein structure, representation, (14 more...)

arXiv.org Artificial Intelligence

2406.14142

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France (0.04)
Asia > Middle East > UAE (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Scientific Large Language Models: A Survey on Biological & Chemical Domains

Zhang, Qiang, Ding, Keyang, Lyv, Tianwen, Wang, Xinda, Yin, Qingyu, Zhang, Yiwen, Yu, Jing, Wang, Yuhao, Li, Xiaotong, Xiang, Zhuoyi, Zhuang, Xiang, Wang, Zeyuan, Qin, Ming, Zhang, Mengyao, Zhang, Jinlu, Cui, Jiyu, Xu, Renjun, Chen, Hongyang, Fan, Xiaohui, Xing, Huabin, Chen, Huajun

arXiv.org Artificial IntelligenceJan-26-2024

Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. This growing interest has led to the advent of scientific LLMs, a novel subclass specifically engineered for facilitating scientific discovery. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration. However, a systematic and up-to-date survey introducing them is currently lacking. In this paper, we endeavor to methodically delineate the concept of "scientific language", whilst providing a thorough review of the latest advancements in scientific LLMs. Given the expansive realm of scientific disciplines, our analysis adopts a focused lens, concentrating on the biological and chemical domains. This includes an in-depth examination of LLMs for textual knowledge, small molecules, macromolecular proteins, genomic sequences, and their combinations, analyzing them in terms of model architectures, capabilities, datasets, and evaluation. Finally, we critically examine the prevailing challenges and point out promising research directions along with the advances of LLMs. By offering a comprehensive overview of technical developments in this field, this survey aspires to be an invaluable resource for researchers navigating the intricate landscape of scientific LLMs.

chemical language, contrastive learning, interaction prediction, (17 more...)

arXiv.org Artificial Intelligence

2401.14656

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia (0.04)
(7 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)
Research Report > Experimental Study (0.45)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Predicting Protein Functions using Machine Learning

#artificialintelligenceFeb-15-2023, 05:20:30 GMT

The CAFA competitions use the Gene Ontology (GO) as the mechanism to annotate proteins. The ontology is a set of controlled vocabulary that is used to precisely define a term in question. The GO, in particular, defines terms for three different protein-related aspects (a.k.a. The GO is an "always evolving" set as new terms are incorporated, refined or made obsolete. The terms in the GO ontology are organized in a tree structure where each of them is given a unique identifier.

annotation, benchmark, protein, (12 more...)

#artificialintelligence

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.72)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.72)

Add feedback

A Review of Deep Learning Techniques for Protein Function Prediction

Aggarwal, Divyanshu, Hasija, Yasha

arXiv.org Artificial IntelligenceOct-27-2022

Deep Learning and big data have shown tremendous success in bioinformatics and computational biology in recent years; artificial intelligence methods have also significantly contributed in the task of protein function classification. This review paper analyzes the recent developments in approaches for the task of predicting protein function using deep learning. We explain the importance of determining the protein function and why automating the following task is crucial. Then, after reviewing the widely used deep learning techniques for this task, we continue our review and highlight the emergence of the modern State of The Art (SOTA) deep learning models which have achieved groundbreaking results in the field of computer vision, natural language processing and multi-modal learning in the last few years. We hope that this review will provide a broad view of the current role and advances of deep learning in biological sciences, especially in predicting protein function tasks and encourage new researchers to contribute to this area.

artificial intelligence, machine learning, protein, (14 more...)

arXiv.org Artificial Intelligence

2211.09705

Country:

Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)

Genre:

Overview (0.54)
Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine learning helps predict protein functions

#artificialintelligenceApr-16-2022, 02:40:47 GMT

To engineer proteins for specific functions, scientists change a protein sequence and experimentally test how that change alters its function. Because there are too many possible amino acid sequence changes to test them all in the laboratory, researchers build computational models that predict protein function based on amino acid sequences. Scientists have now combined multiple machine learning approaches for building a simple predictive model that often works better than established, complex methods.

protein, protein function, sequence, (7 more...)

#artificialintelligence

Country: North America > United States (0.18)

Genre: Press Release (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction

Tseng, Wei-Cheng, Chi, Po-Han, Wu, Jia-Hua, Sun, Min

arXiv.org Artificial IntelligenceDec-1-2021

The capability of accurate prediction of protein functions and properties is essential in the biotechnology industry, e.g. drug development and artificial protein synthesis, etc. The main challenges of protein function prediction are the large label space and the lack of labeled training data. Our method leverages unsupervised sequence embedding and the success of deep convolutional neural network to overcome these challenges. In contrast, most of the existing methods delete the rare protein functions to reduce the label space. Furthermore, some existing methods require additional bio-information (e.g., the 3-dimensional structure of the proteins) which is difficult to be determined in biochemical experiments. Our proposed method significantly outperforms the other methods on the publicly available benchmark using only protein sequences as input. This allows the process of identifying protein functions to be sped up.

amino acid sequence, language model, sequence, (10 more...)

arXiv.org Artificial Intelligence

2112.00344

Country: Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback